1 |
Improving Non-native Word-level Pronunciation Scoring with Phone-level Mixup Data Augmentation and Multi-source Information ...
|
|
|
|
Abstract:
Deep learning-based pronunciation scoring models highly rely on the availability of the annotated non-native data, which is costly and has scalability issues. To deal with the data scarcity problem, data augmentation is commonly used for model pretraining. In this paper, we propose a phone-level mixup, a simple yet effective data augmentation method, to improve the performance of word-level pronunciation scoring. Specifically, given a phoneme sequence from lexicon, the artificial augmented word sample can be generated by randomly sampling from the corresponding phone-level features in training data, while the word score is the average of their GOP scores. Benefit from the arbitrary phone-level combination, the mixup is able to generate any word with various pronunciation scores. Moreover, we utilize multi-source information (e.g., MFCC and deep features) to further improve the scoring system performance. The experiments conducted on the Speechocean762 show that the proposed system outperforms the baseline by ... : 5 pages, 2 figures. This paper is submitted to INTERSPEECH 2022 ...
|
|
Keyword:
Audio and Speech Processing eess.AS; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG
|
|
URL: https://arxiv.org/abs/2203.01826 https://dx.doi.org/10.48550/arxiv.2203.01826
|
|
BASE
|
|
Hide details
|
|
2 |
The Influence of English as a Foreign Language Teachers’ Positive Mood and Hope on Their Academic Buoyancy: A Theoretical Review
|
|
|
|
In: Front Psychol (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Fine-grained style control in Transformer-based Text-to-speech Synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021 ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Measurement of single-diffractive dijet production in proton-proton collisions at $\sqrt{s} =$ 8 TeV with the CMS and TOTEM experiments
|
|
|
|
In: Eur.Phys.J.C ; https://hal.archives-ouvertes.fr/hal-02507664 ; Eur.Phys.J.C, 2020, 80 (12), pp.1164. ⟨10.1140/epjc/s10052-020-08562-y⟩ (2020)
|
|
BASE
|
|
Show details
|
|
9 |
Measurement of the top quark mass with lepton+jets final states using $\mathrm {p}$ $\mathrm {p}$ collisions at $\sqrt{s}=13\,\text {TeV} $
|
|
|
|
In: http://infoscience.epfl.ch/record/275278 (2020)
|
|
BASE
|
|
Show details
|
|
13 |
Joint event extraction based on hierarchical event schemas from framenet
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Requests made by Australian learners of Chinese as a foreign language
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Measurement of prompt and nonprompt charmonium suppression in $\text {PbPb}$ collisions at 5.02 $\,\text {Te}\text {V}$
|
|
|
|
In: Eur.Phys.J.C ; https://hal.archives-ouvertes.fr/hal-01833739 ; Eur.Phys.J.C, 2018, 78 (6), pp.509. ⟨10.1140/epjc/s10052-018-5950-6⟩ (2018)
|
|
BASE
|
|
Show details
|
|
17 |
Additional file 1: of Spatial and temporal clustering analysis of tuberculosis in the mainland of China at the prefecture level, 2005â 2015 ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Additional file 1: of Geographic distribution of echinococcosis in Tibetan region of Sichuan Province, China ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Additional file 1: of Geographic distribution of echinococcosis in Tibetan region of Sichuan Province, China ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Additional file 1: of Spatial and temporal clustering analysis of tuberculosis in the mainland of China at the prefecture level, 2005â 2015 ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|